Back

International Journal of Epidemiology

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match International Journal of Epidemiology's content profile, based on 74 papers previously published here. The average preprint has a 0.10% match score for this journal, so anything above that is already an above-average fit.

1
Biologically informed instrument selection for dietary Mendelian randomization using chemosensory receptor variants

Hwang, L.-D.; Lin, C.; Evans, D. M.; Martin, N. G.; Reed, D. R.; Joseph, P. V.

2026-02-06 epidemiology 10.64898/2026.02.05.26345702 medRxiv
Top 0.1%
33.9%
Show abstract

BackgroundMendelian randomization (MR) is increasingly used for causal inference in nutritional epidemiology; however, dietary MR studies often rely on instruments statistically selected from genome-wide association studies of self-reported intake, which are vulnerable to pleiotropy and reverse causation and may violate core MR assumptions. We aimed to develop and evaluate a biologically informed framework for selecting valid genetic instruments for dietary exposures, based on genes encoding taste and olfactory receptors that mediate chemosensory inputs and shape food preferences and dietary behaviour. MethodsWe prioritised 1,214 nonsynonymous variants in 30 taste and 295 olfactory receptor genes with minor allele frequency [≥]1%. Associations with 140 food-liking traits were tested in UK Biobank participants aged 37 to 73 years. Candidate variants were evaluated using a multi-stage filtering pipeline designed to improve instrument validity. This included replication in an independent younger cohort (Avon Longitudinal Study of Parents and Children, age 25), concordance between food liking and intake, exclusion of associations with socioeconomic status, assessment of food specificity accounting for linkage disequilibrium and co-consumption patterns, and directionality testing to reduce reverse causation. Retained variants were applied as instruments in MR analyses to assess cardiometabolic outcomes. ResultsWe identified 268 nonsynonymous variants within 101 olfactory and 16 taste receptor genes associated with 96 food-liking traits. The filtering process yielded 28 candidate instruments for 24 foods. Among these, the instrument for onion liking uniquely satisfied all criteria for classification as high confidence. To demonstrate clinical relevance, genetically proxied onion liking was associated with lower blood pressure and a reduced risk of type 2 diabetes in MR analyses, with no evidence of effects on body mass index, glycaemic traits, or serum lipid levels. ConclusionsGuiding genetic instrument selection using chemosensory receptor genes provides a biologically informed strategy for dietary Mendelian randomization that reduces susceptibility to pleiotropy and reverse causation. This framework enables more robust causal evaluation of diet-disease relationships and strengthens inference in nutritional epidemiology and public health research.

2
Validity and Interpretation of Two-Sample Mendelian Randomization with Binary Traits

Wu, Z.; Wang, J.

2026-02-18 genetics 10.1101/2024.06.09.598150 medRxiv
Top 0.1%
33.4%
Show abstract

BackgroundTwo-sample Mendelian randomization (MR) is widely applied to binary exposures and outcomes. Yet standard MR models rely on linear effect assumptions that are difficult to interpret for binary traits. Although liability-based interpretations have been suggested, it remains unclear whether conventional summary-data MR is formally justified in this setting or what causal parameter it identifies. MethodsWe develop a liability-threshold framework in which binary traits arise from underlying continuous liabilities. We derive explicit relationships between genome-wide association study (GWAS) coefficients obtained from logistic or linear regression on binary traits and marginal genetic associations on the liability scale. Under small genetic effects, typical for complex traits, observed-scale GWAS coefficients are approximately proportional to liability-scale associations. ResultsThis proportionality implies that standard two-sample MR methods remain statistically coherent for binary traits. MR applied to binary exposures or outcomes estimates a scaled causal effect between underlying liabilities rather than an effect on the observed binary scale. The scaling factor depends primarily on trait prevalence and is directly computable. Simulations and UK Biobank analyses confirm that, after rescaling, MR using binary traits recovers liability-scale causal effects consistent with analyses based on continuous traits. ConclusionsWe provide a formal statistical justification for summary-data MR with binary traits and clarify the causal parameter being estimated. These results support routine MR practice for binary exposures and outcomes while enabling coherent interpretation of effect sizes. Key MessagesO_LIThe interpretation of two-sample MR with binary exposures or outcomes is often unclear because GWAS analyses are performed on the observed binary scale. C_LIO_LIUnder a liability threshold framework with small genetic effects, GWAS coefficients from logistic or linear regression on binary traits are approximately proportional to genetic associations on an underlying continuous liability scale. C_LIO_LIConsequently, conventional summary-data MR applied to binary or ordinal traits remains valid and estimates a scaled causal effect between liabilities, requiring no modification of existing methods. C_LI

3
Machine learning-based prediction of cardiovascular disease risk in Africa using WHO Stepwise Surveys: 2014-2019

Ng'ambi, W.; Merzouki, A.; Estill, J. G.; Keiser, O. G.; Orel, E.

2026-02-26 epidemiology 10.64898/2026.02.23.26346870 medRxiv
Top 0.1%
26.5%
Show abstract

IntroductionCardiovascular diseases (CVDs) are the leading cause of death globally, with rising burdens in Africa due to ageing populations, lifestyle changes, and poor risk factor control. Conventional risk scores developed in high-income settings often perform poorly in African populations. Machine-learning (ML) approaches offer potential to improve prediction by capturing complex, non-linear interactions among demographic, behavioural, and biological factors. This study applies ML models to WHO STEPS survey data to generate context-specific CVD risk predictions across 12 African countries. MethodsWe analysed data from 60,294 adults collected in WHO STEPS surveys between 2014 and 2019 across 12 African countries. Three ML models; Elastic Net logistic regression (LASSO), Random Forest (RF), and XGBoost (XGB); were trained to predict self-reported CVD outcomes. Data were split into training (80%) and testing (20%) sets with five-fold cross-validation. Feature selection used the Boruta algorithm, and model performance was assessed via accuracy, sensitivity, specificity, AUC, F1 score, and Brier score. ResultsOverall CVD prevalence was 5%. Hypertension emerged as the strongest predictor across all models, followed by alcohol-related harm. Tree-based models outperformed regression approaches and conventional clinical scores, with XGBoost achieving the highest discrimination (AUC=0.769), balanced accuracy (0.699), and calibration (Brier score=0.195). Predicted risk trajectories were smoother and more clinically plausible than Framingham or WHO/ISH scores, particularly across age, sex, and hypertension status. LASSO and Random Forest performed moderately, while conventional risk scores showed poor discrimination and marked miscalibration. ConclusionMachine-learning approaches provide accurate, context-specific cardiovascular risk prediction in African populations. By highlighting modifiable risk factors such as hypertension and alcohol-related harm, these models support targeted interventions aligned with WHO PEN, HEARTS, and SBIRT strategies. The African CVD Risk Prediction Tool translates complex data into actionable insights, offering a scalable platform for prevention-focused, equitable cardiovascular care across diverse African settings.

4
Data Resource Profile: EST-Health-30

Reisberg, S.; Oja, M.; Mooses, K.; Tamm, S.; Sild, A.; Talvik, H.-A.; Laur, S.; Kolde, R.; Vilo, J.

2026-04-24 epidemiology 10.64898/2026.04.21.26351087 medRxiv
Top 0.1%
23.0%
Show abstract

Background: The increasing availability of routinely collected health data offers new opportunities for population-level research, yet access to comprehensive, linked, and standardised datasets remains limited. We describe EST-Health-30, a large-scale, population-representative health data resource from Estonia. Methods: EST-Health-30 comprises a random 30% sample of the Estonian population (~500,000 individuals), with longitudinal data from 2012 to 2024 and annual updates planned through 2026. Individual-level records are linked across five nationwide databases, including electronic health records, health insurance claims, prescription data, cancer registry, and cause of death records. A privacy-preserving hashing approach ensures consistent cohort inclusion over time while maintaining pseudonymisation. All data are harmonised to the Observational Medical Outcomes Partnership (OMOP) Common Data Model (version 5.4) using international standard vocabularies. Data quality was assessed using established OMOP-based validation frameworks. Results: The dataset contains rich multimodal information on diagnoses, procedures, laboratory measurements, prescriptions, free-text clinical notes, healthcare utilisation, and costs, with high population coverage and longitudinal depth. Data quality assessment showed high completeness and consistency, with 99.2% of applicable checks passing. The age-sex distribution closely reflects the national population, supporting representativeness, though coverage is marginally below the target 30% (29.2%), primarily attributable to recent immigrants without health system contact. The dataset enables construction of detailed clinical cohorts, analysis of disease trajectories, and evaluation of healthcare utilisation and outcomes across the life course. Conclusions: EST-Health-30 is a comprehensive, standardised, and population-representative real-world data resource that supports epidemiological, clinical, and methodological research. Its alignment with the OMOP CDM facilitates reproducible analytics and participation in international federated research networks, while secure access infrastructure ensures compliance with data protection regulations.

5
Ethnic Differences in the Timing and Incidence of Childhood Health Conditions: Evidence from the Born in Bradford Cohort

Santorelli, G.; Cheung, R. W.; Bhopal, S.; Wright, J.

2026-04-01 epidemiology 10.64898/2026.03.31.26349839 medRxiv
Top 0.1%
22.9%
Show abstract

Objective To examine ethnic differences in the incidence and age-related trajectories of childhood health conditions from birth to adolescence within a UK birth cohort. Design Longitudinal population-based birth cohort with linkage to primary care electronic health records. Setting Born in Bradford (BiB), a multi-ethnic birth cohort in Bradford, UK. Participants 13,282 children (36% White British, 44% Pakistani British, 20% other ethnicity) born 2007 to 2011 with linked primary care records and over 1 year follow-up. Main outcome measures Incident diagnoses of atopic conditions (asthma, eczema, allergic rhinoconjunctivitis), overweight/obesity, common mental health disorders (anxiety, depression), and neurodevelopmental disorders (including ADHD and autism). Incidence rates, Kaplan-Meier cumulative incidence, and Cox regression hazards ratios (HRs) were estimated. Results Atopic conditions emerged early (median onset 5 to 6 years) and were more common among Pakistani British children, with higher hazards of eczema (HR 2.29, 95% CI 2.01 to 2.61), allergic rhinoconjunctivitis (HR 2.27, 2.00 to 2.58), and asthma (HR 1.35, 1.22 to 1.50). Overweight/ obesity developed later (median 9 to 10 years) and were also more frequent in Pakistani British children (HR 1.25, 1.16 to 1.35). In contrast, common mental health disorders emerged predominantly in early adolescence (median around 13 years), and both mental health and neurodevelopmental diagnoses were more frequently recorded among White British children; Pakistani British children had lower hazards of neurodevelopmental diagnoses (HR 0.28, 0.23 to 0.35) and mental health disorders (HR 0.53, 0.41 to 0.70). Conclusions Ethnic differences in childhood health are condition-specific and vary by age of onset, emerging at distinct stages. These findings inform the timing of prevention, service planning, and research into underlying mechanism.

6
The Robust Bidirectional Association Between Chronic Lung Disease and Incident Osteoporosis: A Two-Stage Individual Participant Data Meta-Analysis of Three International Longitudinal Cohorts (HRS, SHARE, and ELSA)

Jiang, D.; Bao, J.

2026-03-19 respiratory medicine 10.64898/2026.03.18.26348689 medRxiv
Top 0.1%
22.5%
Show abstract

Abstract Background: The association between chronic lung disease (CLD) and osteoporosis (OP) is well-recognized, but the direction and magnitude of this relationship remain debated, particularly in aging populations. We aimed to quantify the bidirectional association between CLD (including COPD and asthma) and incident OP using a two-stage individual participant data (IPD) meta-analysis of three large longitudinal cohorts. Methods: We harmonized and analyzed individual-level data from the Health and Retirement Study (HRS, USA), the Survey of Health, Ageing and Retirement in Europe (SHARE, Europe), and the English Longitudinal Study of Ageing (ELSA, UK), all comprising adults aged greater than or equal to[≥]50 years. In the first stage, Cox proportional hazards models were fitted separately in each cohort to estimate hazard ratios (HRs) for the forward (CLD[->]OP) and reverse (OP[->]CLD) associations, adjusting for a comprehensive set of confounders (demographics, lifestyle, comorbidities, functional status). In the second stage, cohort-specific log HRs were pooled using fixed-effect meta-analysis. Heterogeneity was assessed with the I-squared statistic. Results: A total of 40,050 participants were included across the three cohorts. The pooled HR for incident OP among individuals with baseline CLD was 1.37 (95% confidence interval [CI] 1.24-1.51), with similar estimates for COPD (HR 1.47, 95% CI 1.27-1.69) and asthma (HR 1.35, 95% CI 1.22-1.50). For the reverse association, baseline OP was associated with increased risk of incident CLD (pooled HR 1.16, 95% CI 1.05-1.29), COPD (HR 1.28, 95% CI 1.11-1.47), and asthma (HR 1.17, 95% CI 1.05-1.30). Heterogeneity was low across all analyses (I2[≤]7.5%). Conclusion: This two-stage IPD meta-analysis provides robust evidence of a bidirectional relationship between CLD and OP in older adults. These findings underscore the need for integrated screening and management of both conditions in aging populations.

7
Probability of causation in individual workers: Lung cancer due to occupational exposure to asbestos

Mancilla-Galindo, J.; Peters, S.; Deng, H.; van der Molen, H. F.; Kromhout, H.; Portengen, L.; Vermeulen, R.; Heederik, D.

2026-02-09 occupational and environmental health 10.64898/2026.02.06.26345596 medRxiv
Top 0.1%
19.0%
Show abstract

BackgroundLung cancer compensation systems for occupational exposure to asbestos commonly apply Helsinki criteria, which assume 4% excess lung cancer risk per fibre-year of asbestos exposure. The Probability of Causation (PoC) is [≥]50% at 25 fibre-years (risk doubling threshold). Large case-control studies have suggested steeper exposure-response relations at lower exposures. We aimed to estimate PoC of asbestos-related lung cancer to evaluate exposure thresholds for compensation of lung cancer cases occupationally exposed to asbestos. MethodsRelative risk of asbestos-related lung cancer was estimated using two approaches: O_LIA meta-regression of 22 occupational studies forming the core evidence on cumulative asbestos exposure and lung cancer since the 1980s (130,341 participants). C_LIO_LIA meta-analysis of the recently conducted SYNERGY pooled case-control study (14 studies, 37,866 participants), adjusted for age, sex, smoking, and study. C_LI The likelihood that lung cancer was caused by asbestos was estimated as the PoC with 95% prediction intervals (95%PI). ResultsOccupational cohort studies produced a shallow exposure-response relation with substantial heterogeneity (I{superscript 2} = 92.7%). SYNERGY showed a steeper relation with 6.8% (95%PI: 0%-17.7%) lung cancer risk increase per fibre-year and lower heterogeneity (I{superscript 2} = 63.4%). PoC [≥]50% occurred at 62.93 (point estimate) and 18.2 fibre-years (upper 95%PI) for occupational asbestos studies, compared to 10.5 and 4.3, respectively, in SYNERGY. ConclusionsThe SYNERGY pooled case-control study provided exposure-response estimates that are more representative of current exposure to lower mixed asbestos fibres in the Netherlands, supporting lower exposure thresholds than the existing Helsinki criteria when estimating PoC in compensation contexts.

8
Large-Scale Multi-Omics Enhance Risk Prediction for Type 2 Diabetes

Xie, R.; Herder, C.; Schoettker, B.

2026-02-20 epidemiology 10.64898/2026.02.19.26346636 medRxiv
Top 0.1%
18.8%
Show abstract

IntroductionPolygenic risk scores (PRS), metabolomics, and proteomics have each shown promise in improving type 2 diabetes risk prediction, but their combined utility beyond established clinical models remains unclear. We aimed to evaluate whether integrating multi-omics biomarkers enhances 10-year type 2 diabetes risk prediction beyond single-omics extensions and the clinical Cambridge Diabetes Risk Score (CDRS), which includes HbA1c measurements. MethodsWe analysed data from 23,325 UK Biobank participants without diagnosed diabetes at baseline. Data for a PRS for type 2 diabetes, 11 metabolites, and 15 proteins were added to the CDRS to develop multi-omics prediction models. Model performance was evaluated using Harrells C-index and the net reclassification index (NRI). ResultsDuring 10 years of follow-up, 719 participants developed incident type 2 diabetes. Among individual omics layers, proteomics contributed the greatest improvement in predictive performance, increasing the C-index from 0.857 (clinical CDRS) to 0.880 ({Delta}C-index; +0.023; P < 0.001), with an NRI of 30.0%. The full multi-omics model, further significantly increased the C- index compared to a model combining the clinical CDRS with proteomics data (C-index, 0.886; {Delta}C-index; +0.006; P < 0.033). ConclusionIntegrating proteomics, metabolomics, and a diabetes-PRS into a clinical model substantially improves type 2 diabetes risk prediction beyond single-omics extensions. However, the C-index difference between the proteomics extended and full multi-omics extended models is small, and the clinical models extended with proteomics data would be easier to translate into routine care because it needs only the measurement of 15 proteins.

9
Cervical Cancer Screening Uptake in Africa: A Multi-Country Analysis of WHO STEPS Data, 2014-2019

Mulenga, E.; Ng'ambi, W.; Chigere, A.; Mutasha, S.; Zyambo, C.

2026-03-02 epidemiology 10.64898/2026.02.27.26347296 medRxiv
Top 0.1%
18.6%
Show abstract

BackgroundCervical cancer remains a significant global public health challenge, with the overwhelming majority of its burden borne by low- and middle-income countries. Globally, an estimated 660,000 new cases and 350,000 deaths occur each year, with more than 90% of cervical cancer-related mortality concentrated in resource-limited settings. In Africa, limited access to organized screening programs and early detection services continues to contribute to persistently high incidence and mortality rates, despite the preventable nature of the disease. MethodsWe conducted a cross-sectional analysis of WHO STEPwise (STEPS) survey data collected between 2014 and 2019 from 11 African countries. The analysis included 25,471 women aged 15 years and older. Weighted prevalence estimates were calculated, and multivariable logistic regression models were fitted to identify factors associated with ever having been screened for cervical cancer. Predicted probabilities were estimated and stratified by age and residence. ResultsThe pooled prevalence of cervical cancer screening uptake was approximately 10.0%. Uptake was consistently higher among urban women than rural women across all age groups. In adjusted analyses, screening uptake increased strongly with age, peaking at 50-54 years (AOR = 8.21; 95% CI: 5.55-12.14). Higher education showed a clear positive gradient, with tertiary education associated with more than threefold higher odds compared with no education (AOR = 3.20; 95% CI: 2.65-3.86). Urban residence was associated with higher uptake (AOR = 1.22; 95% CI: 1.11-1.34). Substantial cross-country variation was observed, with higher odds in Botswana (AOR = 6.58; 95% CI: 5.51-7.86) and markedly lower odds in Benin (AOR = 0.08; 95% CI: 0.05-0.14). Hypertension was positively associated with screening uptake, while low fruit and vegetable intake were inversely associated. ConclusionsCervical cancer screening uptake in Africa remains critically low and unevenly distributed. Addressing age, educational, urban-rural, and country-level disparities is essential to achieving WHO elimination targets.

10
Homicide in Pregnant and Postpartum versus Nonpregnant and Nonpostpartum Populations: Re-estimation of a Rate Ratio using a Person-time Framework

McNellan, C. R.; Marquez, N.; Alexander, M.

2026-01-26 epidemiology 10.64898/2026.01.25.26344756 medRxiv
Top 0.1%
17.1%
Show abstract

We aim to re-estimate the national homicide rate ratio between nonpregnant/nonpostpartum and pregnant/postpartum women accounting for person-time exposure, which prior studies overlooked. Using a theoretical framework for descriptive epidemiology, we complete a retrospective analysis to estimate the pregnancy-associated homicide rate and re-estimate the national homicide rate ratio between pregnant/postpartum and nonpregnant/nonpostpartum populations in 2020. We use National Vital Statistics System death, fetal death, birth, and Census Bureau data to identify the population at risk. We compare mortality rates and 95% confidence intervals overall and stratified by race, ethnicity, and age. Among the 9,905,908 pregnancies contributing person-time, there were 185 homicides. The relative homicide risk was 35% higher among nonpregnant/nonpostpartum compared to pregnant/postpartum populations. Pregnancy was only associated with elevated risk among ages 10-19 (homicide rate ratio 3.82; 95% CI 2.39-5.77). Homicide rate ratios between nonpregnant/nonpostpartum and pregnant/postpartum women calculated accounting for exposure time and pregnancy transitions contradict previous estimates. Accurate assessment of mortality rates is essential to develop strategies protective against maternal mortality.

11
Using Negative Control Outcomes to Detect Selection Bias in Mendelian Randomization Studies

Gkatzionis, A.; Davey Smith, G.; Tilling, K.

2026-02-01 epidemiology 10.64898/2026.01.30.26345215 medRxiv
Top 0.1%
15.0%
Show abstract

Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically due to confounding) in observational studies. Here, we focus specifically on Mendelian randomization analyses and discuss under what conditions a variable can be used as a negative control outcome to detect selection mechanisms that could bias Mendelian randomization estimates. We show that the main requirement is that the negative control outcome relates to confounders of the exposure and outcome. Counter-intuitively, the effect of the negative control on selection is of secondary concern; for example, a variable that does not affect selection can be a valid negative control for an outcome that does. We also investigate under what conditions age and sex can be used as negative control outcomes in Mendelian randomization analyses. In a real-data application, we investigate the pairwise causal relationships between 19 traits, utilizing data from the UK Biobank. Treating biological sex as a negative control outcome, we identify selection bias in analyses involving commonly used traits such as alcohol consumption, body mass index and educational attainment.

12
Methodological Considerations in Sibling Analyses of Prenatal Acetaminophen

Ahlqvist, V. H.; Sjoqvist, H.; Gardner, R. M.; Lee, B. K.

2026-03-30 epidemiology 10.64898/2026.03.27.26349515 medRxiv
Top 0.1%
15.0%
Show abstract

Background: Sibling-matched designs control for shared familial confounding but remain vulnerable to non-shared confounders. Bi-directional sensitivity analyses, which stratify families by whether the older or younger sibling was exposed, are commonly used to assess carryover effects. We aimed to demonstrate how this methodological approach can introduce severe confounding by parity. Methods: We conducted simulations motivated by a recent epidemiological study. The true causal effect of a hypothetical exposure (prenatal acetaminophen) on neurodevelopmental outcomes was set to strictly null. To introduce parity-related confounding, baseline exposure and outcome probabilities were varied slightly by birth order. We compared conditional logistic regression effect estimates from total sibling models against bi-directional stratified models. Results: In the total simulated sibling cohort, models yielded the true null effect (odds ratio = 1.00) when adjusting for parity. However, the bi-directional analyses exhibited divergent artifactual signals. Because parity is perfectly collinear with exposure in these stratified subsets, it cannot be adjusted for. For example, when the older sibling was exposed, the odds ratio for autism spectrum disorder was 1.68; when the younger was exposed, the odds ratio was 0.60. Conclusions: Divergent estimates in bi-directional sibling analyses can be a predictable artifact of parity confounding rather than evidence of carryover effects or invalidating unmeasured bias. Overall sibling models adjusting for parity may remain robust despite divergent stratified sensitivity results.

13
First-time child protection contacts from 0 to 15 years in a whole-population cohort of Australian Aboriginal children born 2006-2020: a data linkage study

Hanly, M. J.; Newton, B.; Ahmed, T.; Payne, T.; Powell, M.; Cripps, K.; Katz, I.; Pilkington, R.; Lynch, J.; Gray, P.; Falster, K.

2026-03-26 epidemiology 10.64898/2026.03.24.26349231 medRxiv
Top 0.1%
14.9%
Show abstract

BackgroundFirst Nations children are over-represented in child protection systems in Australia and other colonised countries. Here, we apply a prevention and equity lens to the use of child protection data, to inform early opportunities to support Aboriginal children and families at risk of escalating child protection contact, from pregnancy to adolescence. MethodsWe followed 15 whole-population cohorts (born 2006-2020) of Aboriginal (n=119,716) and non-Aboriginal (n=1,456,698) children in New South Wales (NSW), Australia, to December 2021, using birth and child protection datasets linked for the NSW Child E-Cohort. In each Aboriginal and non-Aboriginal cohort (2006-2020), we calculated the cumulative incidence (risk) of first-time child protection contacts from the prenatal period up to age 15 years: child concern reports, screened in reports, investigations, child protection-defined substantiations, and OOHC placements. Risk differences and relative risks were also calculated. FindingsBy birth, 10-15% of Aboriginal children born 2006-2020 had a first report to child protection, with 48-54% by age 5y (2006-2016 births), and 74% by age 15y (2006 births), with similar risks of screened-in reports (e.g. 68% by age 15y). The risk of first-time substantiation was 1-5% of Aboriginal children by birth, 17-20% by 5y, and 32% by 15y, with higher risks in more contemporary cohorts. By age 1y, 3-4% of Aboriginal children born 2006-2020 had a first OOHC placement, with 7-9% by 5y, and 14% by 15y. The risk differences between Aboriginal and non-Aboriginal children were 23 and 3 percentage points for reports and OOHC by age 1y (2020 births), respectively, increasing as children age. InterpretationDespite extensive inquiries, calls for prevention and Closing the Gap targets, our study shows the lifetime risk of child protection involvement for Aboriginal families has not improved and inequities persist. These findings support the call for Aboriginal-led approaches and greater investment in early supports for First Nations children and families. Research in ContextEvidence before this study We searched PubMed and Medline for studies on the lifetime risk of child protection contacts among First Nations child populations, published January 2005 to May 2025. Thirteen studies reported various child protection contacts, from the perinatal period through childhood, among birth or synthetic cohorts of First Nations children, born between 1990 and 2018, created from population data sources in jurisdictions in Australia (n=5), the United States(US) (n=6), and Aotearoa/New Zealand (NZ) (n=2) (Table E1). O_TBL View this table: org.highwire.dtl.DTLVardef@1a0d510org.highwire.dtl.DTLVardef@4198eorg.highwire.dtl.DTLVardef@129da77org.highwire.dtl.DTLVardef@c5e234org.highwire.dtl.DTLVardef@18600d7_HPS_FORMAT_FIGEXP M_TBL O_FLOATNOTable E1.C_FLOATNO O_TABLECAPTIONSystematic Review Results: Details of 13 studies on the lifetime risk of child protection contacts among First Nations child populations, published January 2005 to May 2025. C_TABLECAPTION C_TBL The most recently published study included First Nations children born 2000 to 2013 in Western Australia, which quantified the risk of reports, investigations, substantiations and removals into OOHC, from age 1 to 16 years. By age 1, 12% were reported and 3% were removed into OOHC. By age 16, 52% were reported, and 14% were removed into OOHC. Prior studies of birth or synthetic cohorts of First Nations children born 1990-2018, in the USA, NZ, and South Australia showed similar results. By age 5 years, 16% to 54% for reports, 20% for investigations, 7% to 11% for substantiations and 8% for removals into OOHC. Among the five studies with cohorts followed to 18 years, 42% were reported, 28% to 50% were investigated, 9% to 27% were substantiated, 7% to 16% were removed into OOHC and 0.8% to 3.8% had termination of parental rights. Added value of this study This is the largest and most contemporary study to quantify the lifetime risk of child protection contact among whole-populations of First Nations children internationally. Among 15 consecutive whole-population cohorts of First Nations children in New South Wales (NSW), Australia, born 2006 to 2020, we reported--for the first time--the full spectrum of child protection contacts, from the prenatal period. By birth, 16% were reported to child protection, 14% were investigated and 5% were substantiated in the most contemporary cohort born 2020. By age 1 year, 2.8% were removed into OOHC. In the oldest cohort born 2006, 74% were reported and 14.4% removed into OOHC by age 15 years. We also reveal the magnitude of the inequity in child protection contacts between First Nations and non-Indigenous children across the lifecourse. For example, among 2006 births, the risk of first-time reports to child protection for Aboriginal and non-Aboriginal children, respectively, was 10.5% versus 1.5% by birth (risk difference (RD), 9 percentage points; risk ratio (RR), 7.0), 53% vs 16% by age five (RD, 38pp; RR, 3.4) and 74% vs 33% by age 15 (RD, 41pp; RR 2.2). Implications of all the available evidence This study unequivocally shows that the lifetime risk of child protection involvement in the lives of First Nations families has not reduced in more contemporary whole-population cohorts and that inequities persist. This is consistent with evidence from prior studies internationally. It is critical that First Nations-led responses and investment in early family supports must be at the centre of system reform to realise the long-called-for shift toward prevention and to re-dress the pervasive inequities experienced by First Nations children and families in colonised countries such as Australia.

14
Mapping the Dynamic Interplay of Mental Health and Weight Across Childhood: Data-Driven Explorations Using Causal Discovery

Larsen, T. E.; Lorca, M. H.; Ekstrom, C. T.; Vinding, R.; Bonnelykke, K.; Strandberg-Larsen, K.; Petersen, A. H.

2026-04-17 epidemiology 10.64898/2026.04.16.26350943 medRxiv
Top 0.1%
14.7%
Show abstract

Childhood weight development, especially overweight and obesity, has been associated with mental health, but their dynamic, causal relationships, and whether these differ by sex, remain unclear. We applied causal discovery to data from the Danish National Birth Cohort (n=67,593) spanning six periods from pregnancy to late adolescence and considering 67 variables related to child and parental weight, mental health, lifestyle, and socio-economic factors. We found no statistically significant difference between the causal graphs for boys and girls (P=0.079). The data-driven models found causal influence of childhood weight on subsequent weight status. Mental health pathways were exclusively within or across adjacent periods and centered on early adolescent stress. We examined the interplay between a subset of mental health variables, containing information on externalizing and internalizing problems, and weight, and found no direct causal pathway between the two processes. These findings suggest that observed links between weight and these mental health measures may be attributable to confounding. Our findings demonstrate the value of data-driven causal discovery in large cohort studies and how to test for differences in causal mechanisms across subgroups. Results are available in an interactive application, enabling future research to further explore the interplay between weight and mental health.

15
Mother-infant linked UK electronic birth cohorts representing 17.5 million births harmonised to the OMOP common data model

Seaborne, M.; Durbaba, S.; Mendez-Villalon, A.; Giles, T.; Gonzalez-Izquierdo, A.; Hough, A.; Sanchez-Soriano, C.; Snell, H.; Cockburn, N.; Nirantharakumar, K.; Poston, L.; Reynolds, R.; Santorelli, G.; Brophy, S.

2026-03-25 public and global health 10.64898/2026.03.23.26349078 medRxiv
Top 0.1%
14.4%
Show abstract

We describe the harmonisation of five UK electronic birth cohorts to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, creating a large scale, standardised resource for maternal and child health research. The Mother and Infant Research Data Analysis (MIREDA) partnership developed and implemented reproducible guidelines for mapping maternal infant relationships and identifying pregnancy episodes within routinely collected healthcare data. Cohorts from England, Scotland, and Wales were transformed despite substantial heterogeneity in data structure, coding systems, and variable definitions. The resulting harmonised resource preserves each cohort as an independent dataset while enabling federated analyses to be conducted across sites without the need to share individual level data. Collectively, the cohorts capture over 17.5 million live births, providing sufficient scale to investigate rare exposures and outcomes, support trial emulation, and evaluate population level policy impacts across the UK. This article details the transformation pipeline and provides reusable methods to support extension to additional cohorts and networks. The harmonised datasets enable interoperable, reproducible research and facilitate cross national comparative studies in maternal and child health.

16
Menopause in the All of Us Research Program: A Descriptive Summary of Electronic Health Record and Survey Response across Sociodemographic Characteristics

Staples, J. W.; White, S. L.; Giacalone, A.; Pozdeyev, N.; Sammel, M. D.; Stranger, B. E.; Valencia, C. I.; Santoro, N.; Hendricks, A. E.

2026-04-25 sexual and reproductive health 10.64898/2026.04.17.26351129 medRxiv
Top 0.1%
12.5%
Show abstract

Objective. Menopause is a significant physiological transition with implications for health outcomes (e.g., cardiometabolic), yet gaps remain in understanding the menopause transition, including how menopause timing and type influence health outcomes. Large-scale cohort studies in midlife (age~40-60) females, including the All of Us Research Program (AoURP), provide opportunities to study menopause across diverse populations and data modalities. We characterized menopause-related data in AoURP, focusing on age distributions and concordance between EHR diagnosis codes and self-reported survey responses. Methods. We analyzed menopause-related survey, EHR diagnostic code, and genomic data among ~396,000 participants in AoURP with female sex. We summarized menopause data across modalities, overlap between survey, EHR, and genomic data, and age distributions overall and across sociodemographic characteristics. Results. Among ~396,000 females, surveys captured ~193,000 menopause observations, nearly seven times more than structured EHR diagnoses (~28,000), suggesting under- ascertainement in EHR data. Nearly all females (~99%) with an EHR menopause diagnosis also reported menopause in the survey. Approximately 22,000 participants had intersected EHR, survey, and genomic menopause-related data. Survey-based age patterns matched expectations, with participants <40 years predominantly reporting pre-menopausal status and those >60 years predominantly reporting post-menopausal status. A small subset (N{approx}1,700; 4%) (age>70 years) reported no menopause, suggesting response or recall bias. EHR menopause codes were concentrated after age>45 years, with a notable spike at age 65. Modest differences in survey-based menopause age distributions were observed by sociodemographic characteristics (e.g., race, ancestry). Conclusions. These findings inform sampling strategies, power calculations, phenotype definition, and study design for menopause research using AoURP.

17
Violence exposure and mental health problems among school-aged children in a South African birth cohort

Bailey, M.; Hammerton, G.; Fairchild, G.; Tsunga, L.; Hoffman, N.; Burd, T.; Shadwell, R.; Danese, A.; Armour, C.; Zar, H. J.; Stein, D. J.; Donald, K. A.; Halligan, S. L.

2026-04-22 psychiatry and clinical psychology 10.64898/2026.04.20.26351289 medRxiv
Top 0.1%
12.0%
Show abstract

ObjectiveThere is little longitudinal research investigating links between violence exposure and mental disorders among children in low- and middle-income countries (LMICs), despite high rates of violence. We examined cross-sectional and longitudinal violence-mental health associations among children in a large South African birth cohort, the Drakenstein Child Health Study, including direct clinical interviews capturing childrens mental disorders. MethodIn this birth cohort (N=974), we assessed lifetime violence exposure and four subtypes (witnessed community, community victimization, witnessed domestic, domestic victimization) at ages 4.5 and 8-years via caregiver reports. At 8-years, caregivers completed the Child Behaviour Checklist; and psychiatric disorders were assessed using the Mini-International Neuropsychiatric Interview for Children and Adolescents, a self-report measure. We tested for associations using linear/logistic regressions, adjusted for confounders. ResultsMost children (91%) had experienced violence by 8-years. Cross-sectionally, total violence exposure was associated with total (B =0.49 [95% CI 0.32, 0.66]), internalizing (0.32 [0.17, 0.47]), and externalizing problems (0.46 [0.31, 0.61]), and with increased odds of disorder at 8 years (aOR=1.09 [1.05, 1.13]). Longitudinally, total violence exposure up to 4.5-years was associated with total (B=0.27 [0.03, 0.52]), internalizing (0.24 [0.04. 0.44]), and externalizing scores (0.23 [0.008, 0.45]) at 8-years, but not with increased risk of psychiatric disorders. The strongest and most consistent associations were observed for domestic versus community violence subtypes. ConclusionOur strong cross-sectional but weaker longitudinal findings suggest that recent violence exposures may be more critical than early exposures for childrens mental health. Longitudinal exploration of other violence-affected LMIC populations is urgently needed.

18
Life Course Socioeconomic Position and health in older adulthood age: A Formal Mediation Analysis in the 1958 British Birth Cohort

Guo, Y.; Pelikh, A.; Ploubidis, G. B.; Goodman, A.

2026-03-25 epidemiology 10.64898/2026.03.23.26349085 medRxiv
Top 0.1%
11.9%
Show abstract

Background Childhood socioeconomic position (SEP) is a key determinant of later life health. Understanding the extent to which adult SEP mediates this association into early old age is important for explaining how health inequalities are propagated across generations and how they might be addressed in later life. To our knowledge, no prospective study has examined whether childhood SEP remains associated with health at the threshold of older age and the extent to which any such association is mediated by adult SEP. Methods We used data from the 1958 British Birth Cohort, a prospective study that has followed participants since birth, drawing on earlier data collected at birth and ages 33 and 55 years and newly collected data from the age 62 sweep. Using interventional causal mediation analyses, we assessed whether adult occupational class, education, housing tenure, and income mediate associations between childhood social class (manual vs non manual) and health at age 62 (self rated health, C reactive protein [CRP], cholesterol ratio, Glycated hemoglobin [HbA1c], and N terminal pro B type natriuretic peptide [NT proBNP]). Findings Associations between childhood SEP and self rated health, CRP, cholesterol ratio, and HbA1c persisted after accounting for adult SEP. Mediation was outcome specific and differed by sex. Among men, occupational class mediated 39% of the association with self rated health (indirect effect RR 0.90, 95% CI 0.86,0.95) and education mediated 27% (0.93, 0.90,0.96). Among women, education mediated 10% (0.95, 0.91,0.98) and housing tenure mediated 6% (0.97, 0.94,0.99). Indirect effects for CRP were smaller, and mediation was minimal for cholesterol ratio, HbA1c, and NT proBNP Interpretation Population level improvements in adult SEP could reduce, but are unlikely to eliminate, later life health inequalities associated with childhood SEP. Reducing these inequalities will require policies that address disadvantage in early life and improve adult financial and employment conditions. Funding UK Economic and Social Research Council

19
Multi-Omics characterization of biological pathways linking healthy dietary patterns to cardiometabolic disease risk across diverse populations

Han, J.; Deng, K.; Hong, Z.; Zhang, Z.; Godneva, N.; de Mutsert, R.; van Hylckama Vlieg, A.; Rosendaal, F. R.; Mook-Kanamori, D. O.; Zheng, J.-S.; Chen, Y.; Segal, E.; Li-Gao, R.; DIYUFOOD consortium,

2026-02-26 epidemiology 10.64898/2026.02.23.26346874 medRxiv
Top 0.1%
10.6%
Show abstract

Background and ObjectivesRecent large-scale studies have consistently linked healthy dietary patterns to improved cardiometabolic health; however, the underlying biological pathways remain largely unclear, especially in non-European populations. In this study, we leverage data from four population-based cohorts (UK Biobank, NEO study, GNHS, and 10K) to investigate both common and cohort-specific biological pathways linking healthy dietary patterns to cardiometabolic disease through multi-omics profiling. Material and methodsIn each cohort, we first assessed the associations between each of the five major dietary pattern scores (i.e., AMED, hPDI, DII, AHEI, and EDIH) and cardiometabolic disease risk using Cox or logistic regression models. To explore the potential mediating role, metabolomics and proteomics measurements were incorporated into the models. All models were adjusted for relevant confounders, and false discovery rate correction was applied to account for multiple testing. ResultsWith a total of 71,679 individuals without pre-existing cardiometabolic disease across four participating cohorts (UKB: 54,024, NEO: 4,838, GNHS: 3,201, and 10K: 9,616), we confirmed that adherence to healthy dietary patterns was associated with a 5-10% reduced risk of cardiometabolic disease. Three common biological pathways were identified: (1) mediation via large HDL particles and apolipoprotein F; (2) mediation via DNAJ/Hsp40 and triglyceride-rich lipoproteins; and (3) mediation via CRHBP-regulated HPA axis activity affecting triglyceride-rich lipoproteins. ConclusionsOur integrative multi-omics analysis across diverse populations identifies novel biomarkers that connect healthy dietary patterns with cardiometabolic risk. These findings deepen our understanding of the biological mechanisms underlying diet-related disease and hold promise for enhancing the development of precision nutrition interventions.

20
Capturing India's phenotypic diversity: Health insights from the GenomeIndia project

Mondal, D.; Bhattacharyya, C.; Shekhawat, D. S.; Tada, N. G.; Rajial, T.; Parameswaran, A. S.; Jena, D.; Datta, S.; Swain, M.; Jena, S.; Mishra, A.; Mahapatra, S.; Sathi, S. N.; Alam, M.; Ali, A.; Choudhury, P.; Ghosh, P.; Tripathi, D.; Anilkumar, S.; Ashwath, D.; Chithimmaiah, M.; Hameed, S. K. S.; Gunasegaran, R.; Singh, N.; Mala, G.; De, T.; Reza, S.; Mukherjee, A.; Prajapati, B.; Dave, B.; Yumnam, S.; Vimi, K.; Sharma, G. N.; Malik, A.; Sarma, R. J.; Vanlallawma, A.; Samartha, D. K.; G, T. S.; Kavya, P. V.; Deshpande, S.; GenomeIndia Consortium, ; Singh, K.; Sharma, P.; Raghav, S. K.; Pra

2026-04-02 public and global health 10.64898/2026.04.01.26349926 medRxiv
Top 0.1%
10.5%
Show abstract

Background India represents 18% of the global population yet remains underrepresented in health research. Moreover, existing national surveys miss critical variation across its 4,600 ethnolinguistic groups. We present a comprehensive phenotypic characterisation of 81 populations from the GenomeIndia project. Methods We analysed 67 sociodemographic, anthropometric, and blood biochemistry variables from 17,777 individuals sampled across 81 ethnolinguistic populations from India, examining population-level variation, disease reporting fractions, and age- and sex-specific life-course trends. Findings Ethnolinguistic identity predicted health outcomes independently of administrative state, improving phenotypic variance explained by an average of 7.4%. 95% of participants had at least one abnormal biochemical or anthropometric marker, driven by low HDL (52.2%) and elevated triglycerides (43.6%). Metabolic risk, however, was highly stratified: adjusted prevalence for low HDL ranged four-fold across ancestry groups from 17.2% to 67.7%. We also identified an "awareness gap"; only 17.6% of people with hypertension and 2.2% of people with dyslipidemia were aware of their condition. This awareness gap was higher in tribal populations, in which women did not show the higher HDL levels typically seen compared to men, pointing to distinct metabolic profiles and healthcare access barriers across India. Interpretation The Indian phenotypic landscape is highly structured along ethnolinguistic lines, where ancestry and environment both influence risk. The high systemic burden of abnormalities necessitates population-specific reference intervals. GenomeIndia provides a foundational map for precision public health, shifting the focus from state-level averages to population-specific risk profiles. Funding This work was funded by the Department of Biotechnology, Ministry of Science and Technology, Government of India.